Skip to content

Issue#1307 json string read write#1310

Closed
padam-prakash wants to merge 12 commits into
zinggAI:mainfrom
padam-prakash:Issue#1307JsonStringReadWrite
Closed

Issue#1307 json string read write#1310
padam-prakash wants to merge 12 commits into
zinggAI:mainfrom
padam-prakash:Issue#1307JsonStringReadWrite

Conversation

@padam-prakash
Copy link
Copy Markdown
Contributor

Remove legacy Python export/peek model phases and related resources, update client wiring and argument handling. ZinggOptions (assess/peek/export) and the Spark Python phase runner mapping were removed; python/exportModel script, Python-related helpers (argument write/JSON methods) and many test/resources/docs for the peek/export flows were deleted. fixes #1307 . fixes #1309

padam-prakash and others added 12 commits April 24, 2026 11:10
Corrects a typo and contraction in adv-matchtypes.md ("transfomration" -> "transformation", "dont" -> "don't"). Adds a footnote to field-definitions.md clarifying what Zingg Enterprise is and linking to the product comparison page for feature details.
* Fix typo and add Zingg Enterprise footnote

Corrects a typo and contraction in adv-matchtypes.md ("transfomration" -> "transformation", "dont" -> "don't"). Adds a footnote to field-definitions.md clarifying what Zingg Enterprise is and linking to the product comparison page for feature details.

* Merge remote-tracking branch 'upstream/main'

* Throw on invalid match type; update test to write args

Change MatchTypes to throw an Exception when a match type name is not found instead of returning null, preventing silent null returns. Update TestArguments to call argumentService.writeArguments(...) to persist the Arguments before reloading them

* Use IllegalArgumentException in getByName

updated throw IllegalArgumentException instead of the generic Exception.
* Add ClickHouse JDBC documentation

* ix

---------

Co-authored-by: Ishan Arora <ishanarora@Ishans-MacBook-Air.local>
Remove legacy Python export/peek model phases and related resources, update client wiring and argument handling.  ZinggOptions (assess/peek/export) and the Spark Python phase runner mapping were removed; python/exportModel script, Python-related helpers (argument write/JSON methods) and many test/resources/docs for the peek/export flows were deleted
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes legacy Python/CLI-era model phases (assess/peek/export) and related Python “Arguments JSON string/file” helpers, updating the Java client wiring accordingly, and cleans up associated tests/resources/docs. It also adds initial ClickHouse JDBC connector documentation and bumps the Docker image to Zingg 0.6.0 artifacts.

Changes:

  • Removed deprecated phases (ASSESS_MODEL / PEEK_MODEL / EXPORT_MODEL) and their Spark executor wiring/resources.
  • Removed Python Arguments JSON-string/JSON-file helper APIs and the exportModel Python phase/script; updated argument writing to be file-based.
  • Added ClickHouse JDBC connector doc and updated Dockerfile to use Zingg 0.6.0 distribution.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/testFebrl/testArgs.py Removes Python tests for deprecated Arguments JSON helpers and peek-model phase default.
spark/core/src/test/resources/testPeekModel/test.csv Deletes peek-model test dataset resource.
spark/core/src/test/resources/testPeekModel/config.json Deletes peek-model test config resource.
spark/core/src/test/java/zingg/spark/core/executor/TestPeekModel.java Removes obsolete peek-model integration test.
spark/core/src/main/java/zingg/spark/core/executor/SparkZFactory.java Removes mapping for deprecated peek-model phase runner.
spark/core/src/main/java/zingg/spark/core/executor/SparkPythonPhaseRunner.java Deletes generic Spark Python phase runner used by removed phases.
spark/client/src/test/java/zingg/spark/client/TestArguments.java Updates test to write arguments to JSON file via writeArguments(...).
python/zingg/client.py Removes Python-side Arguments JSON read/write/copy helpers; adjusts ClientOptions defaults/arg parsing.
python/phases/exportModel.py Deletes legacy exportModel Python phase script.
docs/setup/training/exportLabeledData.md Removes documentation page for exportModel flow.
docs/setup/training/addOwnTrainingData.md Removes reference to exporting labeled data for reuse.
docs/connectors/jdbc/clickhouse.md Adds ClickHouse JDBC connector configuration documentation.
docs/SUMMARY.md Removes nav entry for exporting labeled data doc.
common/core/src/test/resources/testPeekModel/test.csv Deletes common-core peek-model test dataset.
common/core/src/test/resources/testPeekModel/config.json Deletes common-core peek-model test config.
common/client/src/test/java/zingg/common/client/TestArguments.java Import ordering/assertion import tightening (no functional change).
common/client/src/main/java/zingg/common/client/options/ZinggOptions.java Removes deprecated phase constants from supported options list.
common/client/src/main/java/zingg/common/client/arguments/ArgumentServiceImpl.java Switches argument writing to WriterType.FILE (file-based writer).
common/client/src/main/java/zingg/common/client/MatchTypes.java Changes invalid match type handling to throw IllegalArgumentException instead of returning null.
common/client/src/main/java/zingg/common/client/Client.java Removes silent fallback to peek-model phase; logs and rethrows on invalid phase.
Dockerfile Updates ZINGG_HOME and downloaded distribution to 0.6.0 (Spark build referenced updated).
Comments suppressed due to low confidence (2)

python/zingg/client.py:752

  • ClientOptions.__init__ uses print(...) statements, which will pollute stdout for library consumers and test output. Since this is a library API, prefer LOG.debug(...) (or remove these statements entirely) so output can be controlled via logging configuration.
        print(argsSent)
        if argsSent == None:
            args = []
        else:
            args = argsSent.copy()
        if self.PHASE not in args:
            args.append(self.PHASE)
        if self.LICENSE not in args:
            args.append(self.LICENSE)
            args.append("zinggLic.txt")
        if self.EMAIL not in args:
            args.append(self.EMAIL)
            args.append("zingg@zingg.ai")
        if self.CONF not in args:
            args.append(self.CONF)
            args.append("dummyConf.json")
        print("arguments for client options are ", args)
        self.co = getJVM().zingg.common.client.ClientOptions(args)

spark/client/src/test/java/zingg/spark/client/TestArguments.java:68

  • This test now writes to a hard-coded path under /tmp but never cleans it up. That can make tests flaky on restricted CI runners and can leave artifacts behind. Prefer using a JUnit temp directory/file (e.g., @TempDir) and deleting the file after the assertion, or use Files.createTempFile(...) scoped to the test.
			args.setCollectMetrics(true);
			args.setModelId("500");
			argumentService.writeArguments("/tmp/configFromArgObject.json", args);

			//reload the same config file to check if deserialization is successful
			IArguments newArgs = argumentService.loadArguments("/tmp/configFromArgObject.json");
			assertEquals(newArgs.getModelId(), "500", "Model id is different");

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/zingg/client.py
@@ -711,29 +703,6 @@ def createArgumentsFromJSON(fileName, phase):
obj.args = getJVM().zingg.common.client.argumentst.ArgumentServiceImpl().loadArguments(fileName)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createArgumentsFromJSON is calling getJVM().zingg.common.client.argumentst.ArgumentServiceImpl(). The argumentst package looks like a typo and will fail at runtime when Python tries to load arguments from a JSON config. Please change it to the correct JVM path for ArgumentServiceImpl under zingg.common.client.arguments (and consider adding/adjusting a unit test that exercises this call).

Suggested change
obj.args = getJVM().zingg.common.client.argumentst.ArgumentServiceImpl().loadArguments(fileName)
obj.args = getJVM().zingg.common.client.arguments.ArgumentServiceImpl().loadArguments(fileName)

Copilot uses AI. Check for mistakes.
Comment thread python/zingg/client.py
Comment on lines 740 to 741
if self.PHASE not in args:
args.append(self.PHASE)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the default phase value, ClientOptions.__init__ still appends --phase when it’s missing but does not append any phase value. This produces an invalid arg list and the JVM ClientOptions parser will throw “Missing argument for option '--phase'” for ClientOptions() and for any call site that doesn’t provide --phase explicitly (several tests construct ClientOptions() this way). Either stop auto-inserting --phase unless a value is also provided, or raise a clear Python-side error when --phase is omitted.

Suggested change
if self.PHASE not in args:
args.append(self.PHASE)

Copilot uses AI. Check for mistakes.
Comment thread Dockerfile
WORKDIR /zingg-0.5.0
RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.5.0/zingg-0.5.0-spark-3.5.0.tar.gz | \
WORKDIR /zingg-0.6.0
RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.6.0.tar.gz | \
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker base image is still apache/spark:3.5.0-python3, but the downloaded Zingg distribution is the ...-spark-3.6.0.tar.gz build. Mixing a Spark 3.6 build with a Spark 3.5 runtime can lead to binary incompatibilities at runtime. Please align these by either updating the base image to Spark 3.6 (if intended) or downloading the Spark 3.5 build of the Zingg tarball.

Suggested change
RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.6.0.tar.gz | \
RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.5.0.tar.gz | \

Copilot uses AI. Check for mistakes.
Comment thread docs/SUMMARY.md
Comment on lines 43 to 48
* [Using Pre-existing Training Data](setup/training/addOwnTrainingData.md)
* [Updating Labeled Pairs](updatingLabels.md)
* [Documenting The Training Data](stepbystep/createtrainingdata/generatingdocumentation.md)
* [Exporting Labeled Data](setup/training/exportLabeledData.md)
* [Model Difference](stepbystep/createtrainingdata/modeldiff.md)
* [Ensuring Scalability](verifyBlocking.md)
* [Building And Saving The Model](setup/train.md)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new connector doc was added at docs/connectors/jdbc/clickhouse.md, but it isn’t linked from docs/SUMMARY.md. With mdBook/GitBook-style navigation, unlisted pages are typically undiscoverable (and may not be built). Please add ClickHouse under the JDBC section (near the existing Postgres/MySQL entries) so the page is reachable.

Copilot uses AI. Check for mistakes.
@padam-prakash
Copy link
Copy Markdown
Contributor Author

Duplicate #1311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate Model phases JSON String Read and Write Methods

4 participants